ATLAS - A New Text Alignment Architecture

نویسنده

  • Bettina Schrader
چکیده

We are presenting a new, hybrid alignment architecture for aligning bilingual, linguistically annotated parallel corpora. It is able to align simultaneously at paragraph, sentence, phrase and word level, using statistical and heuristic cues, along with linguistics-based rules. The system currently aligns English and German texts, and the linguistic annotation used covers POS-tags, lemmas and syntactic constitutents. However, as the system is highly modular, we can easily adapt it to new language pairs and other types of annotation. The hybrid nature of the system allows experiments with a variety of alignment cues to find solutions to word alignment problems like the correct alignment of rare words and multiwords, or how to align despite syntactic differences between two languages. First performance tests are promising, and we are setting up a gold standard for a thorough evaluation of the system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structural dynamics in northern Atlas of Tunisian, Jendouba area: insights from geology and gravity data

This paper presents a new interpretation of the geometry of Triassic alignment of J. Sidi Mahdi –J. Zitoun in Medjerda Valley Plain (Northern Tunisia) based on detailed analysis of gravity and seismic reflection data. The main results of gravity analysis do not show a distinguish gravity anomaly over Triassic evaporites bodies. The positive gravity anomaly seems to be related to the entire stru...

متن کامل

Architecture Narration: A Comparative Study on Narration in Architecture and Story

The way architects think about different issues from developing plans, perspectives, and views to cross-sections and structure of a building is a common and general one. Regardless of its merits and efficiency, this way of thinking indicates a degradation in architectural thinking. Indeed, architectures today are caught in a specific architecture language where the boundaries of language create...

متن کامل

Constructing 2D Curve Atlases

We present an approach to computing a curve atlas based on deriving a correspondence between two curves. This correspondence is based on a notion of an alignment curve and on a measure of similarity between the intrinsic properties of the curve, namely, length and curvature. The optimal correspondence is found by an efficient dynamicprogramming method. This is then used to compute an average fo...

متن کامل

A System Architecture for Parallel Corpus-based Grammar Learning

This paper describes an architecture for exploiting implicit information about the grammar of the languages included in a parallel corpus. By initially applying statistical word alignment and defining an appropriate representation format for cross-linguistic structural correspondence, this implicit information can feed a system for bootstrapping grammars. The proposed architecture will be under...

متن کامل

The Compilation of Urbanism Texts by Using the Iranian's-Valuable Texts (With Emphasis on the Islamic Ethics)

It is clear that each community should be have the specific urbanism science. Science localization is an obvious matter. This matter has motivated Iranian researchers, in urbanism field, to naturalize urbanism science having been imported to Iran. One method for producing or indigenizing urbanism texts in Iran, especially in recent years, is Utilization of Iranian-valuable texts. There are high...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006